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In a smooth semiparametric estimation problem, the marginal 
O^ ■ posterior for the parameter of interest is expected to be asymp- 

totically normal and satisfy frequentist criteria of optimality if the 
model is endowed with a suitable prior. It is shown that, under cer- 
Li^ I tain straightforward and interpretable conditions, the assertion of Le 

r/^ • Cam's acclaimed, but strictly parametric, Bernstein-von Mises the- 

orem [Univ. California Publ. Statist. 1 (1953) 277-329] holds in the 
semiparametric situation as well. As a consequence, Bayesian point- 
ed ■ estimators achieve efficiency, for example, in the sense of Hajek's 

convolution theorem [Z. Wahrsch. Verw. Gebiete 14 (1970) 323-330]. 
The model is required to satisfy differentiability and metric entropy 
conditions, while the nuisance prior must assign nonzero mass to 
certain KuUback-Leibler neighborhoods [Ghosal, Ghosh and van der 
Vaart Ann. Statist. 28 (2000) 500-531]. In addition, the marginal pos- 
terior is required to converge at parametric rate, which appears to be 
the most stringent condition in examples. The results are applied to 
estimation of the linear coefficient in partial linear regression, with 
a Gaussian prior on a smoothness class for the nuisance. 



1. Introduction. The concept of efficiency has its origin in Fisher's 1920s 
claim of asymptotic optimality of the maximum-likelihood estimator in dif- 
ferentiable parametric models (Fisher [13]). In 1930s and 1940s, Fisher's 
ideas on optimality in differentiable models were sharpened and elaborated 



Ci\ upon (see, e.g., Cramer [10]), until Hodges's 1951 discovery of a supereffi- 

cient estimator indicated that a comprehensive understanding of optimality 
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in differentiable estimation problems remained elusive. Further considera- 
tion directed attention to the property of regularity to delimit the class of 
estimators over which optimality is achieved. Hajek's convolution theorem 
(Hajek [17]) implies that within the class of regular estimates, asymptotic 
variance is lower-bounded by the Cramer-Rao bound in the limit exper- 
iment [29]. The asymptotic minimax theorem (Hajek [18]) underlines the 
central role of the concept of regularity. An estimator that is optimal among 
regular estimates is called best-regular; in a Hellinger differentiable model, 
an estimator (On) for 6 is best-regular if and only if it is asymptotically 
linear, that is, for all 6 in the model, 



1 " 

(1.1) v^{en-o) = ^Y.i^'ie{Xi) + op,{i 



), 



where ig is the score for 6 and Ig the corresponding Fisher information. 
To address the question of efficiency in smooth parametric models from 
a Bayesian perspective, we turn to the Bernstein-von Mises theorem. In 
the literature many different versions of the theorem exist, varying both in 
(stringency of) conditions and (strength or) form of the assertion. Following 
Le Cam and Yang [31] (see also van der Vaart [43]), we state the theorem 
as follows. (For later reference, define a prior to be thick at ^O; if it has 
a Lebesgue density that is continuous and strictly positive at ^o-) 

Theorem 1.1 (Bernstein-von Mises, parametric). Assume that G C ffi*^ 
is open and that the model ^ = {Pg -.9 £ @} is identifiable and dominated. 
Suppose Xi,X2, . . . forms an i.i.d. sample from Pg^ for som,e 6q e 0. Assume 
that the model is locally asymptotically normal at Oq with nonsingular Fisher 
information Ig^. Furthermore, suppose that: 

(i) the prior Hq is thick at 9q; 
(ii) for every e > 0, there exists a test sequence {(j)n) such that 

P^Jn^O, sup P^il-^n)^0- 

\\e~eo\\>e 
Then the posterior distributions converge in total variation, 

snp\U{9eB\X,,...,Xr.)-N^ , ,{B)\^0 

in Pg^ -probability, where {On) denotes any best-regular estimator sequence. 

For a proof, the reader is referred to [31, 43] (or to Kleijn and van der 
Vaart [26] , for a proof under model misspecification that has a lot in common 
with the proof of Theorem 5.1 below). 

Neither the frequentist theory on asymptotic optimality nor Theorem 1.1 
generalize fully to nonparametric estimation problems. Examples of the fail- 
ure of the Bernstein-von Mises limit in infinite-dimensional problems (with 
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regard to the full parameter) can be found in Freedman [14]. Freedman 
initiated a discussion concerning the merits of Bayesian methods in non- 
parametric problems as early as 1963, showing that even with a natural and 
seemingly innocuous choice of the nonparametric prior, posterior inconsis- 
tency may result [15]. This warning against instances of inconsistency due to 
ill-advised nonparametric priors was reiterated in the literature many times 
over, for example, in Cox [9] and in Diaconis and Freedman [11, 12]. However, 
general conditions for Bayesian consistency were formulated by Schwartz as 
early as 1965 [37]; positive results on posterior rates of convergence in the 
same spirit were obtained in Ghosal, Ghosh and van der Vaart [16] (see also, 
Shen and Wasserman [40] ) . The combined message of negative and positive 
results appears to be that the choice of a nonparametric prior is a sensitive 
one that leaves room for unintended consequences unless due care is taken. 
This lesson must also be taken seriously when one asks the question 
whether the posterior for the parameter of interest in a semiparametric esti- 
mation problem displays Bernstein-von Mises-type limiting behavior. Like 
in the parametric case, we estimate a finite-dimensional parameter 9 £ @, 
but now in a model ^ that also leaves room for an infinite-dimensional nui- 
sance parameter rj £ H. We look for general sufficient conditions on model 
and prior such that the marginal posterior for the parameter of interest sat- 
isfies 

(1.2) snp\U{V^{0-0o)€B\Xi,...,Xn)-N^ j-i {B)\^0 

in P^Q -probability, where 

1 " 

(1-3) ^n = ^^Io,]Je,,,Voi^^). 

Here Iq^^j denotes the efficient score function and /g^^ the efficient Fisher 
information [assumed to be nonsingular at (00)%)] ■ The sequence A„ also 
features on the r.h.s. of the semiparametric version of (1.1) (see Lemma 25.23 
in [43]). Assertion (1.2) often implies efficiency of point-estimators like the 
posterior median, mode or mean (a first condition being that the estimate is 
a functional on R, continuous in total- variation [24, 43]) and always leads to 
asymptotic identification of credible regions with efficient confidence regions. 
To illustrate, if C is a credible set in 6, (1.2) guarantees that posterior 
coverage and coverage under the limiting normal for C are (close to) equal. 
Because the limiting normals are also the asymptotic sampling distributions 
for efficient point-estimators, (1.2) enables interpretation of credible sets as 
asymptotically efficient confidence regions. From a practical point of view, 
the latter conclusion has an important implication: whereas it can be hard 
to compute optimal semiparametric confidence regions directly, simulation 
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of a large sample from the marginal posterior (e.g., by MCMC techniques; 
see Robert [36]) is sometimes comparatively straightforward. 

Instances of the Bernstein-von Mises limit have been studied in various 
semiparametric models: several papers have provided studies of asymptotic 
normality of posterior distributions for models from survival analysis. Partic- 
ularly, Kim and Lee [22] show that the infinite- dimensional posterior for the 
cumulative hazard function under right-censoring converges at rate n"^' ^ to 
a Gaussian centered at the Aalen-Nelson estimator for a class of neutral-to- 
the-right process priors. In Kim [21], the posterior for the baseline cumula- 
tive hazard function and regression coefficients in Cox's proportional hazard 
model are considered with similar priors. Castillo [6] considers marginal 
posteriors in Cox's proportional hazards model and Stein's symmetric lo- 
cation problem from a unified point of view. A general approach has been 
given in Shen [39], but his conditions may prove somewhat hard to verify 
in examples. Cheng and Kosorok [8] give a general perspective too, proving 
weak convergence of the posterior under sufficient conditions. Rivoirard and 
Rousseau [35] prove a version for linear functionals over the model, using 
a class of nonparametric priors based on infinite-dimensional exponential 
families. Boucheron and Gassiat [4] consider the Bernstein-von Mises the- 
orem for families of discrete distributions. Johnstone [20] studies various 
marginal posteriors in the Gaussian sequence model. 

Notation and conventions. The (frequentist) true distribution of the data 
is denoted Pq and assumed to lie in ^, so that there exist 6q £ Q, ijq £ H 
such that Pq = Peo,r]o- We localize 6 by introducing h = \/n{Q — 6q) with in- 
verse On{h) = 9o + n^^/'^h. The expectation of a random variable / with 
respect to a probability measure P is denoted Pf; the sample average 
oig{X) is denoted P„<7(^) = (V^) EILi^l^*) and G„ff(X) = nV2(p^<^(x)- 
Pg{X)) (for other conventions and nomenclature customary in empirical 
process theory, see [45]). If hn is stochastic, -P^ (-^ ■> / denotes the integral 

mu){dP]^^^^^^^-^^^^/dP^){u)dP^{oj). The Hellinger distance between P 
and P' is denoted H{P, P') and induces a metric du on the space of nuisance 
parameters H by dniTj^rj') = H{Pg^^ri,PeQ,ri')-, for all rj^rj' G H . We endow the 
model with the Borel a-algebra generated by the Hellinger topology and re- 
fer to [16] regarding issues of measurability. 

2. Main results. Consider estimation of a functional : ^ — )• R on a dom- 
inated nonparametric model 3^ with metric g, based on a sample Xi, X2, . . . , 
i.i.d. according io Pq£ 3^. We introduce a prior II on ^ and consider the 
subsequent sequence of posteriors, 

„ n „ n 

(2.1) I[{A\Xi,...,Xn)= T\p{Xi)dIi{P)/ T\p{X,)dIl{P), 
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where A is any measurable model subset. Typically, optimal (e.g., min- 
imax) nonparametric posterior rates of convergence [16] are powers of n 
(possibly modified by a slowly varying function) that converge to zero more 
slowly than the parametric n~^/^-rate. Estimators for 9 may be derived by 
"plugging in" a nonparametric estimate [cf. 6 = 0{P)], but optimality in 
rate or asymptotic variance cannot be expected to obtain generically in this 
way. This does not preclude efficient estimation of real- valued aspects of Pq: 
parametrize the model in terms of a finite-dimensional parameter of interest 
G and a nuisance parameter rj £ H where O is open in M and {H, dn) an 
infinite-dimensional metric space: ^ = {Pe,r] '-6 € &,r] G H}. Assuming iden- 
tifiability, there exist unique Bq G@, rjo & H such that Pq = -P6»o,f?o- Assuming 
measurability of the map {6,r]) ^ Pe,r], we place a product prior He x Hjj on 
Q X H to define a prior on ^. Parametric rates for the marginal posterior 
of 9 are achievable because it is possible for contraction of the full posterior 
to occur anisotropically, that is, at rate n~^'^ along the ^-direction, but at 
a slower, nonparametric rate (/9.„) along the 7/-directions. 

2.1. Method of proof. The proof of (1.2) will consist of three steps: in 
Section 3, we show that the posterior concentrates its mass around so-called 
least- favorable submodels (see Stein [42] and [1, 43]). In the second step (see 
Section 4), we show that this implies local asymptotic normality (LAN) 
for integrals of the likelihood over H, with the efficient score determining 
the expansion. In Section 5, it is shown that these LAN integrals induce 
asymptotic normality of the marginal posterior, analogous to the way lo- 
cal asymptotic normality of parametric likelihoods induces the parametric 
Bernstein- von Mises theorem. 

To see why asymptotic accumulation of posterior mass occurs around 
so-called least-favorable submodels, a crude argument departs from the ob- 
servation that, according to (2.1), posterior concentration occurs in regions 
of the model with relatively high (log-)likelihood (barring inhomogeneities 
of the prior). Asymptotically, such regions are characterized by close-to- 
minimal Kullback-Leibler divergence with respect to Pq. To exploit this, let 
us assume that for each in a neighborhood Uq of 9q, there exists a unique 
minimizer r]*{9) of the Kullback-Leibler divergence, 

(2.2) - Po log — ^-^ = mf -Po log ■ 



giving rise to a submodel ^* = {Pq = Pe,r)*{e) '■ & £ Uq}. As is well known [38], 
if ^* is smooth it constitutes a least-favorable submodel and scores along ^* 
are efficient. [In subsequent sections it is not required that ^* is defined 
by (2.2), only that ^* is least-favorable.] Neighborhoods of ^* are de- 
scribed with Hellinger balls in H of radius p > around rj*[9), for all 9 £Uq, 

(2.3) D{9,p) = {r,eH:dH{v,V*m<p}- 
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To give a more precise argument for posterior concentration around 'r]*{6), 
consider the posterior for r], given 6 G Uq; unless 9 happens to be equal 
to ^0) the submodel ^g = {Pe,ri -il £ H} is misspecified. Kleijn and van der 
Vaart [27] show that the misspecified posterior concentrates asymptotically 
in any (Hellinger) neighborhood of the point of minimal Kullback-Leibler 
divergence with respect to the true distribution of the data. Applied to ^q, 
we see that D{9,p) receives asymptotic posterior probability one for any 
p > 0. For posterior concentration to occur [16, 27] sufficient prior mass must 
be present in certain Kullback-Leibler-type neighborhoods. In the present 
context, these neighborhoods can be defined as 

Kn{p,M) = LeH:Po( sup -log?^^^^i^)<p\ 



(2.4) 



\h\\<M Peo, 



Va 



2 

Po { sup - log — '-^ ] < p 

\<A<i Peo,m 



for /9 > and M > 0. If this type of posterior convergence occurs with an 
appropriate form of uniformity over the relevant values of 9 (see "consis- 
tency under perturbation," Section 3), one expects that the nonparametric 
posterior contracts into Hellinger neighborhoods of the curve 9 i— )• {9, if (9)) 
(Theorem 3.1 and Corollary 3.3). 

To introduce the second step, consider (2.1) with A = B x H for some 
measurable B C Q. Since the prior is of product form, 11 = He x Hh, the 
marginal posterior for the parameter 6 € Q depends on the nuisance factor 
only through the integrated likelihood ratio, 

„ n 

(2.5) Sn:e^R:9^ ll^^^{Xi)dUH{r,), 

JHfJ[Peo,7io 

where we have introduced factors pQ^^rioi^i) in the denominator for later 
convenience; see (5.1). [The localized version of (2.5) is denoted /i i— )• s„(/i); 
see (4.1).] The map Sn is to be viewed in a role similar to that of the 
profile likelihood in semiparametric maximum-likelihood methods (see, e.g., 
Severini and Wong [38] and Murphy and van der Vaart [34]), in the sense 
that Sn embodies the intermediate stage between nonparametric and semi- 
parametric steps of the estimation procedure. 

We impose smoothness through a form of Le Cam's local asymptotic 
normality: let P £ ^ be given, and let 1 1— )• Pj be a one-dimensional submodel 
of ^ such that Pt=o = P. Specializing to i.i.d. observations, we say that 
the model is stochastically LAN at P £ ^ along the direction t >-^ Pt, if 
there exists an L2(-P)-function gp with Pgp = such that for all random 
sequences (/i„) bounded in P-probability, 

(2.6) logllP^rl^{X,) = ^Y.^lgpiX,)-^hllphn + op{l). 
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Here gp is the score- function, and Ip = P{gp)'^ is the Fisher information of 
the submodel at P. Stochastic LAN is shghtly stronger than the usual LAN 
property [28, 31]. In examples, the proof of the ordinary LAN property often 
extends to stochastic LAN without significant difficulties. 

Although formally only a convenience, the presentation benefits from an 
adaptive reparametrization (see Section 2.4 of Bickel et al. [1]): based on the 
least-favorable submodel r/*, we define, for all 9 gUq, rj £ H , 

(2.7) (e, 77(^,0) = (^,^*(^) + c), io,a0,v)) = {0,r]- v*m, 

and we introduce the notation Qg^(^ = Pe,n{e,c)- With C = 0, 6 ^ Qgfl de- 
scribes the least-favorable submodel ^* and with a nonzero value of C, 
9 '— ^ Qe,c describes a version thereof, translated over a nuisance direction (see 
Figure 2). Expressed in terms of the metric rH{CiiC2) = H{Qg^^(^-^,QQg^i^^), the 
sets D{6, p) are mapped to open balls B[p) = {( £ H : rniCi 0) < p} centered 
at the origin C = Oi 

{Pe,rr-0 eUo,r] e D{d,p)} = {Qe,c.0 eUo,C ^ B{p)}. 

In the formulation of Theorem 2.1, we make use of a domination condition 
based on the quantities 

Un{p,h)= sup Ql (fl^MlK^xA 

for all p > and /i G M'^. Below, it is required that there exists a sequence {pn) 
with Pn i 0, npn — )■ oo, such that, for every bounded, stochastic sequence (hn), 
U{pn,hn) = 0(1) (where the expectation concerns the stochastic dependence 
of hn as well; see Notation and conventions). For a single, fixed C, the re- 
quirement says that the likelihood ratio remains integrable when we replace 
0n{hn) by the maximum-likelihood estimator 0„(Xi, . . . ,X„). Lemma 4.3 
demonstrates that ordinary differentiability of the likelihood-ratio with re- 
spect to /i, combined with a uniform upper bound on certain Fisher in- 
formation coefficients, suffices to satisfy U{pn,hn) = 0(1) for all bounded, 
stochastic (/i„) and every p„, 4,0. 

The second step of the proof can now be summarized as follows: assum- 
ing stochastic LAN of the model, contraction of the nuisance posterior as 
in Figure 1 and said domination condition are enough to turn LAN expan- 
sions for the integrand in (2.5) into a single LAN expansion for Sn- The 
latter is determined by the efficient score, because the locus of posterior 
concentration, ^* , is a least-favorable submodel (see Theorem 4.2). 

The third step is based on two observations: first, in a semiparametric 
problem, the integrals Sn appear in the expression for the marginal poste- 
rior in exactly the same way as parametric likelihood ratios appear in the 
posterior for parametric problems. Second, the parametric Bernstein-von 
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Fig. 1. A neighborhood of {9o,r]o). Shown are the least-favorable curve {(9,ri*{6)): 
6 £ f/o} CLud (for fixed 6 and p>0) the neighborhood D(6,p) ofri*(6). The sets D{9,p) are 
expected to capture (9 -conditional) posterior mass one asymptotically, for all p > and 
9€Uo. 




Fig. 2. A neighborhood of{9o,rjo). Curved lines represent sets {{9X) -^ £ Uo} for fixed (^. 
The curve through C = parametrizes the least-favorable submodel. Vertical dashed lines 
delimit regions such that \\9 — 9o\\ < n~^''^ . Also indicated are directions along which the 
likelihood is expanded, with score functions g^ . 
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Mises proof depends on likelihood ratios only through the LAN property. 
As a consequence, local asymptotic normality for Sn offers the possibility to 
apply Le Cam's proof of posterior asymptotic normality in semiparametric 
context. If, in addition, we impose contraction at parametric rate for the 
marginal posterior, the LAN expansion of Sn leads to the conclusion that 
the marginal posterior satisfies the Bernstein-von Mises assertion (1.2); see 
Theorem 5.1. 

2.2. Main theorem. Before we state the main result of this paper, general 
conditions imposed on models and priors are formulated: 

(i) Model assumptions. Throughout the remainder of this article, ^ is 
assumed to be well specified and dominated by a cr-finite measure on the 
sample space and parametrized identifiably on x if , with C M'^ open 
and H a subset of a metric vector-space with metric dn ■ Smoothness of the 
model is required but mentioned explicitly throughout. We also assume that 
there exists an open neighborhood Uq C @ of 9q on which a least-favorable 
submodel rj* '.Uo ^ H is defined. 

(ii) Prior assumptions. With regard to the prior 11 we follow the product 
structure of the parametrization of ,^, by endowing the parameterspace 
Q X H with a product-prior He x Hh defined on a cr-field that includes the 
Borel (7-field generated by the product-topology. Also, it is assumed that 
the prior Hq is thick at ^o- 

With the above general considerations for model and prior in mind, we 
formulate the main result of this paper. 

Theorem 2.1 (Semiparametric Bernstein-von Mises). Let Xi,X2, . . . be 
distributed i.i.d.-Po, with Pq £ ^, and let Hq be thick at 6q. Suppose that 
for large enough n, the map /ii— t- Sn{h) is continuous Pq- almost- surely. Also 
assume that 9 i— )• Qe,c, ^^ stochastically LAN in the 9-direction, for all (^ in 
an rn -neighborhood of C,=Q and that the efficient Fisher information /^o./yo 
is nonsingular. Furthermore, assume that there exists a sequence (pn) with 
PniO, npn — )• oo such that: 

(i) For all M > 0, there exists a K > such that, for large enough n, 

nH(i^„(/9„,M))>e-^"'''. 

(ii) For all n large enough, the Hellinger metric entropy satisfies 

N{pn,H,dH)<e''p'- 

and, for every bounded, stochastic (hn). 

(iii) The model satisfies the domination condition, 

(2.8) Un{pn,hn) = 0{l). 
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(iv) For all L > 0, Helling er distances satisfy the uniform bound, 

Slip rr(p px = o(l)- 

Finally, suppose that 

(v) for every (Mn), Mn — )• oo, the posterior satisfies 

n„(||/i||<M„|Xi,...,x„) Ai. 

Then the sequence of marginal posteriors for 6 converges in total variation 
to a normal distribution, 

(2.9) snp\Un{h€A\Xi,...,Xn)-N^ j-i (A)|Ao, 

centered on A„ with covariance matrix In 

Proof. The assertion follows from combination of Theorem 3.1, Corol- 
lary 3.3, Theorems 4.2 and 5.1. D 

Let us briefly discuss some aspects of the conditions of Theorem 2.1. 
First, consider the required existence of a least- favor able submodel in ^. In 
many semiparametric problems, the efficient score function is not a proper 
score in the sense that it corresponds to a smooth submodel; instead, the 
efficient score lies in the L2-closure of the set of all proper scores. So there 
exist sequences of so-called approximately least-favorable submodels whose 
scores converge to the efficient score in L2 [43]. Using such approximations 
of ^* , our proof will entail extra conditions, but there is no reason to 
expect problems of an overly restrictive nature. It may therefore be hoped 
that the result remains largely unchanged if we turn (2.7) into a sequence of 
reparametrizations based on suitably chosen approximately least-favorable 
submodels. 

Second, consider the rate (/0„), which must be slow enough to satisfy 
condition (iv) and is fixed at (or above) the minimax Hellinger rate for esti- 
mation of the nuisance with known ^0 by condition (ii), while satisfying (i) 
and (iii) as well. Conditions (i) and (ii) also arise when considering Hellinger 
rates for nonparametric posterior convergence and the methods of Ghosal et 
al. [16] can be applied in the present context with minor modifications. In 
addition, Lemma 4.3 shows that in a wide class of semiparametric models, 
condition (iii) is satisfied for any rate sequence (pn)- Typically, the numer- 
ator in condition (iv) is of order 0(n~^'^), so that condition (iv) holds true 
for any />„ such that np^ — ?• 00. The above enables a rate- free version of 
the semiparametric Bernstein-von Mises theorem (Corollary 5.2), in which 
conditions (i) and (ii) above are weakened to become comparable to those 
of Schwartz [37] for nonparametric posterior consistency. Applicability of 
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Corollary 5.2 is demonstrated in Section 7, where the linear coefficient in 
the partial linear regression model is estimated. 

Third, consider condition (v) of Theorem 2.1: though it is necessary [as 
it follows from (2.9)], it is hard to formulate straightforward sufficient con- 
ditions to satisfy (v) in generality. Moreover, condition (v) involves the nui- 
sance prior and, as such, imposes another condition on Hh besides (i). To 
lessen its influence on Hh, constructions in Section 6 either work for all 
nuisance priors (see Lemma 6.1) or require only consistency of the nuisance 
posterior (see Theorem 6.2). The latter is based on the limiting behavior of 
posteriors in misspecified parametric models [24, 26] and allows for the tenta- 
tive but general observation that a bias [cf. (6.6)] may ruin n~^' ^-consistency 
of the marginal posterior, especially if the rate (pn) is sub-optimal. In the 
example of Section 7, the "hard work" stems from condition (v) of The- 
orem 2.1: a > 1/2 Holder smoothness and boundedness of the family of 
regression functions in Corollary 7.2 are imposed in order to satisfy this 
condition. Since conditions (i) and (ii) appear quite reasonable and con- 
ditions (iii) and (iv) are satisfied relatively easily, condition (v) should be 
viewed as the most complicated in an essential way. 

To conclude, consistency under perturbation (with appropriate rate) is 
one of the sufficient conditions, but it is by no means clear in how far it 
should also hold with necessity. One expects that in some situations where 
consistency under perturbation fails to hold fully, integral local asymptotic 
normality (see Section 4) is still satisfied in a weaker form. In particular, it is 
possible that (4.2) holds with a less-than-efficient score and Fisher informa- 
tion, a result that would have an interpretation analogous to suboptimality 
in Hajek's convolution theorem. What happens in cases where integral LAN 
fails more comprehensively is both interesting and completely mysterious 
from the point of view taken in this article. 

3. Posterior convergence under perturbation. In this section, we con- 
sider contraction of the posterior around least-favorable submodels. We ex- 
press this form of posterior convergence by showing that (under suitable 
conditions) the conditional posterior for the nuisance parameter contracts 
around the least-favorable submodel, conditioned on a sequence 0n{hn) for 
the parameter of interest with /i„ = Op^(l). We view the sequence of mod- 
els ^en{h„) ^ ^ random perturbation of the model ^q^ and generalize 
Ghosal et al. [16] to describe posterior contraction. Ultimately, random per- 
turbation of 6 represents the "appropriate form of uniformity" referred to 
just after definition (2.4). Given a rate sequence (pn), Pn i 0, we say that 
the conditioned nuisance posterior is consistent under n'^'"^ -perturbation at 
rate pn, if 

(3.1) Tin{D\e, p^)\e = e^ + n~^/\n; Xi, . . . , x„) A o 

for all bounded, stochastic sequences (/in)- 
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Theorem 3.1 (Posterior rate of convergence under perturbation). As- 
sume that there exists a sequence (pn) with p„ J, 0, npf^ — t- oo such that for 
all M > and every bounded, stochastic (hn): 

(i) There exists a constant K > such that for large enough n, 

(3.2) nH(i^„(/9„,M))>e-^"''". 

(ii) For L > large enough, there exist (0„) such that for large enough n, 

(3.3) Po^n^O, sup P^" (1 - ^,) < e-^^-^^/^ 

(iii) The least-favorable submodel satisfies d_f/(??*(^n(^n));??o) = o{pn)- 

Then, for every bounded, stochastic {hn) there exists an L > such that the 
conditional nuisance posterior converges as 

(3.4) U{D'{e,Lpn) \e = eo + n'^'^hn^Xi, . . . ,X„) = op,{l) 
under n~^''^ -perturbation. 

Proof. Let (hn) be a stochastic sequence bounded by M, and let < 
C < 1 be given. Let K and (/o„) be as in conditions (i) and (ii). Choose 
L > 4\/l -\- K -\-C and large enough to satisfy condition (ii) for some (0n). 
By Lemma 3.4, the events 

^^^1 /■rT^MM^(^^)^nH(r/)>e-(i+^)"''^^nH(i^„(pn,M))| 
satisfy PQ{An) — )• 0. Using also the first limit in (3.3), we then derive 

P^U{D'{6,Lpn) I e = 9n{hn)\Xi,. ..,Xn) 

< PSI[{D^{9, Lpn) I 9 = 9n{hn);Xi, . . . , X„,)U„ (1 - <A„,) + o(l) 

[even with random (/in), the posterior n(-|^ = 6n{hn);Xi,. . . ,Xn) < 1, by 
definition (2.1)]. The first term on the r.h.s. can be bounded further by the 
definition of the events An , 

P„"n(D'(«,tp„)|9 = «„;Xi,..,,X„)U„(l-0„) 
Due to condition (iii) it follows that 



(3.5) 



D(eo,hpn] C fl D{9n{hn),Lpn) 
^ ^ n>l 
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for large enough n. Therefore, 



n 



JD'^{en{h„),Lpn)~Jl Peo,m 
(3.6) 



< 



JD'^{eo,Lpn/2) 



Upon substitution of (3.6) and with the use of the second bound in (3.3) 
and (3.2), the choice we made earher for L proves the assertion. D 

We conclude from the above that besides sufficiency of prior mass, the 
crucial condition for consistency under perturbation is the existence of a test 
sequence ((/>„) satisfying (3.3). To find sufficient conditions, we follow a con- 
struction of tests based on the Hellinger geometry of the model, generaliz- 
ing the approach of Birge [2, 3] and Le Cam [30] to n^^' ^-perturbed con- 
text. It is easiest to illustrate their approach by considering the problem 
of testing/estimating r/ when 6q is known: we cover the nuisance model 
{Peo,r]'-'n ^ H} by a minimal collection of Hellinger balls B of radii (/0„), 
each of which is convex and hence testable against Pq with power bounded 
by exp(— ;jnff^(Po5-B)), based on the minimax theorem [30]. The tests for 
the covering Hellinger balls are combined into a single test for the nonconvex 
alternative {P:H{P,Pq) > pn} against Pq. The order of the cover controls 
the power of the combined test. Therefore the construction requires an upper 
bound to Hellinger metric entropy numbers [45] 

(3.7) N{pn,^eo,H)<e^p", 

which is interpreted as indicative of the nuisance model's complexity in the 
sense that the lower bound to the collection of rates (p„) solving (3.7) is the 
Hellinger minimax rate for estimation of r^o- In the n~^' ^-perturbed problem, 
the alternative does not just consist of the complement of a Hellinger-ball 
in the nuisance factor H, but also has an extent in the ^-direction shrinking 
at rate n~^''^. Condition (3.8) below guarantees that Hellinger covers of H 
like the above are large enough to accommodate the ^-extent of the alter- 
native, the implication being that the test sequence one constructs for the 
nuisance in case 9q is known, can also be used when 6q is known only up 
to n~^' ^-perturbation. Therefore, the entropy bound in Lemma 3.2 is (3.7). 
Geometrically, (3.8) requires that n~^' ^-perturbed versions of the nuisance 
model are contained in a narrowing sequence of metric cones based at Pq . In 
differentiable models, the Hellinger distance H{PQ^(^i^^-^^^,Pg^^ri) is typically 
of order 0(n~^' ^) for all r] ^ H. So if, in addition, np^ — )• oo, limit (3.8) is 
expected to hold pointwise in rj. Then only the uniform character of (3.8) 
truly forms a condition. 
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Lemma 3.2 (Testing under perturbation). // {pn) satisfies pn i 0,] 
np^ — )• oo and the following requirements are met: 

2 

(i) For all n large enough, N[pn,H,dH) < e"''". 
(ii) For all L > and all bounded, stochastic (hn), 

(3-8) sup — — — =o(l). 

Then for all L>4, there exists a test sequence {(f>n) such that for all bounded, 
stochastic (hn), 

(3.9) Po>n^O, sup P- (1 -</>„)< e-^^'^''^/^ 

77eD=(ef,,Lp„) 
for large enough n. 

Proof. Let {pn) be such that (i) and (ii) are satisfied. Let (/i„) and 
L > 4 be given. For all j > 1, define Hj^n = {?7 € H:jLpn < dH{r]o,r]) < 
(j + l)Lpn} and ^j,n = {P0o,v'-V ^ Hj,n}- Cover =^j,n with Hellinger balls 
Bij^niljLpn), where 

Bi^,^n{r) = {P:H{Pij^n,P)<r} 

and Pi,j,n S ^j,n, that is, there exists an ?7jj> S Hj,n such that Pij^n = 
Peo,v^,j,r.- Denote Hij^n = {v ^ Hj^niPg^^ri G Bij^nijjLpn)}- By assumption, 
the minimal number of such balls needed to cover ^jj is finite; we denote 
the corresponding covering number by -/Vj,„, that is, 1 < i < Nj^n. 

Let T] € Hj^n be given. There exists an i (1 < i < Nj^n) such that dni'']., 
Vi,j,n) < \jLpn- Then, by the triangle inequality, the definition of Hj^n and 
assumption (3.8), 

^{Pen{h,,),r,:Peo,Vi,j,-n) 

^ H{Per,(hn),v,Peo,r,) + H{Peo,ri-, -Peo,»?»,j,n) 

(3.10) < ^^';"g")--"'^^;--'^ ff(P,,„Po) + \jLpr. 

J^[Peo,vPo) 4 

< sup — — {j + l)Lpn + -jL/3„ 

for large enough n. We conclude that there exists an iV > 1 such that for all 
n>N,j>l, l<i< Nj^n, V G Hij^n, Pe^{h„),r, ^ BijAkJ^Pn)- Moreover, 
Hellinger balls are convex and for all P G BijA^J^Pn), H{P,Po) > ^jLpn. 
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As a consequence of the minimax theorem (see Le Cam [30], Birge [2, 3]), 
there exists a test sequence {4>ij,n)n>i such that 

' ' p 

where the supremum runs over ah P G Bij^ni^jLpn)- Defining, for all n > 1, 
(j)n = supj>|maxi<j<7Vj ^ (/)jj^„, we find (for details, see the proof of Theo- 
rem 3.10 in [24]) that 

(3.11) P^^^n < 5Z ^^i,ne-^'^■'''^"/^ P"(l - 0n) < C"^'"''"/^ 

for all P = Pe„{hn),ri ^^d 1] G D'^{9Q,Lpn). Since L > 4, we have for all j > 1, 

(3.12) 

<N{pn,^,H)<e'^P" 

by assumption (3.7). Upon substitution of (3.12) into (3.11), we obtain the 
following bounds: 

e 



{l-LV4)np2 



for large enough n, which implies assertion (3.9). D 

In preparation of Corollary 5.2, we also provide a version of Theorem 3.1 
that only asserts consistency under ^"^"-perturbation at some rate while 
relaxing bounds for prior mass and entropy. In the statement of the corollary, 
we make use of the family of Kullback-Leibler neighborhoods that would 
play a role for the posterior of the nuisance if ^o were known [16]. 

(3.13) K{p) = \veH:-Polog^^<p',Po(log^^y <p' 

for all p > 0. The proof below follows steps similar to those in the proof of 
Corollary 2.1 in [27]. 

Corollary 3.3 (Posterior consistency under perturbation). Assume 
that for all p>0, N{p, H, dn) < oo, IIh{K{p)) > and: 

(i) For all M > there is an L > such that for all p > and large 
enough n, K[p) C Kn{Lp,AI). 

(ii) For every bounded random sequence (hn), svcp^^u H[Pg^(^^^^,^^PQf^,f^) 

and H{Pe^^ri*{en(h„)),Peo,rio) are of order 0{n^^/^). 

Then there exists a sequence (pn), Pn i 0, np^ — >• oo, such that the conditional 
nuisance posterior converges under n^^/'^ -perturbation at rate (pn)- 
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Proof. We follow the proof of Corollary 2.1 in Kleijn and van der 
Vaart [27] and add that, under condition (ii), (3.8) and condition (iii) of 
Theorem 3.1 are satisfied. We conclude that there exists a test sequence 
satisfying (3.3). Then the assertion of Theorem 3.1 holds. D 

The following lemma generalizes Lemma 8.1 in Ghosal et al. [16] to the 
n"-"^' ^-perturbed setting. 

Lemma 3.4. Let (hn) be stochastic and bounded by some M > 0. Then 

n \ 

n ^^^^^^^(X,) dUniv) < e"(^+^)"''^nH(ir„(/., M)) 
(3.14) ' '-| ^ 

for all C > 0, p> and n > 1. 

Proof. See the proof of Lemma 8.1 in Ghosal et al. [16] (dominating 
the /in-dependent log-likelihood ratio immediately after the first application 
of Jensen's inequality). D 

4. Integrating local asymptotic normality. The smoothness condition in 
the Le Cam's parametric Bernstein-von Mises theorem is a LAN expansion 
of the likelihood, which is replaced in semiparametric context by a stochastic 
LAN expansion of the integrated likelihood (2.5). In this section, we consider 
sufficient conditions under which the localized integrated likelihood 

„ n 

(4.1) sr.ih)= / n "^"''''''' (^Odn^(^) 

has the integral LAN property; that is, s„ allows an expansion of the form 

( h \ 1 OO -. 

^^''^^ ^°^ I (0) " 7^ ^ ^T^do,m - 2^n^eo,r?o^n + OPo(l) 

for every random sequence {hn) C M*"' of order Opg(l), as required in The- 
orem 5.1. Theorem 4.2 assumes that the model is stochastically LAN and 
requires consistency under n~^'^-perturbation for the nuisance posterior. 
Consistency not only allows us to restrict sufficient conditions to neighbor- 
hoods of r]Q in iJ, but also enables lifting of the LAN expansion of the inte- 
grand in (4.1) to an expansion of the integral s„ itself; cf. (4.2). The posterior 
concentrates on the least-favorable submodel so that only the least-favorable 
expansion at r]Q contributes to (4.2) asymptotically. For this reason, the in- 
tergral LAN expansion is determined by the efficient score function (and 
not some other influence function). Ultimately, occurrence of the efficient 
score lends the marginal posterior (and statistics based upon it) properties 
of frequentist semiparametric optimality. 
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To derive Theorem 4.2, we reparametrize the model; cf. (2.7). While yield- 
ing adaptivity, this reparametrization also leads to ^-dependence in the prior 
for C, a technical issue that we tackle before addressing the main point of 
this section. We show that the prior mass of the relevant neighborhoods dis- 
plays the appropriate type of stability, under a condition on local behavior of 
Hellinger distances in the least-favorable model. For smooth least-favorable 
submodels, typically (iH(^*(^n(^n)),^o) = 0{n~^''^) for all bounded, stochas- 
tic (/in)) which suffices. 

Lemma 4.1 (Prior stability). Let (hn) be a bounded, stochastic sequence 
of perturbations, and let Hh be any prior on H . Let (pn) be such that 
d/f (r/*(0„(/i„)),77o) = o{pn)- Then the prior mass of radius- pn neighborhoods 
of rj* is stable, that is, 

(4.3) IiH{D{en{hn),Pn))=^H{D{eQ,Pn))+o{l). 

Proof. Let (/i„) and (/>„) be such that dH{ri*{On{hn))-,f1o) = o{pn)- De- 
note D{9n{hn),Pn) by Dn and D{6Q,pn) by C„ for all n > 1. Since 

\nH{Dn) - UH{Cn)\ < UnHDn U C„) \ {D^ H C„)), 

we consider the sequence of symmetric differences. Fix some < q < 1. 
Then for all rj G Dn and all n large enough, dnir], 7]q) < dH{rj,rj*{9ri{hn))) + 
dH{'n*{On{hn)),rio) < {l + a)pn, so that -D„UC„ C D{9o,{l-\-a)pn). Further- 
more, for large enough n and any i] £ D{6q, (1 — a)pn), dH{'i],i]*{(^nihn))) < 
dH{v,ilo) + dH{'no,il*i0n{hn))) < Pn + dH{r]o,r]* {6nihn))) - apn <Pn, SO that 
D{9q, (1 - a)pn) C DnH C„. Therefore, 

(Dn U C„) \ {Dn n Cn) C D{9o, (1 + «)/?„) \ D{9o, (1 - a)pn) -^ 0, 

which implies (4.3). D 

Once stability of the nuisance prior is established, Theorem 4.2 hinges on 
stochastic local asymptotic normality of the submodels 1 1— )• Qg^+t^^ ^^ ^^1 C 
in an r//-neighborhood of C = 0. We assume there exists a g(^ £ L2{QeQ,() 
such that for every random (/i„) bounded in Qeo.C'P^obability, 



(/ln,C), 



n 1 '" 1 

(4.4) logn ''^""'''-^ (^.) = ^ E f^ndCm - \hlhK + Rr 

where I(^ = Qeo,c9(9j and Rn{hn,C) = OQg ^{l). Equation (4.4) specifies the 
(minimal) tangent set (van der Vaart [43], Section 25.4) with respect to 
which differentiability of the model is required. Note that go = ieo,r]o- 

Theorem 4.2 (Integral local asymptotic normality). Suppose that 9 >-^ 
Qex ^^ stochastically LAN for all Q in an r h -neighborhood of Q = Q. Further- 
more, assume that posterior consistency under n~^''^ -perturbation obtains 
with a rate (pn) also valid in (2.8). Then the integral LAN-expansion (4-2) 
holds. 
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Proof. Throughout this proof G„(/i, C) = ^/nh^^nQi^ - ^h^I(h, for ah h 
and all C- Furthermore, we abbreviate 9n{hn) to On and omit explicit notation 
for (Xi, . . . , X„)-dependence in several places. 

Let (5, e > be given, and let 9n = 6o + n~^''^hn with (/i„) bounded in 
Po'Probability. Then there exists a constant M > such that P(^(||/i„|| > 
M) < ^6 for all n > 1. With (/i„) bounded, the assumption of consistency 
under n~^' ^-perturbation says that 

P^{logU{D{e,pn) \e = en;Xi,...,X„)>-e)>l- l6 

for large enough n. This implies that the posterior's numerator and denom- 
inator are related through 

n 




_-, Pdo,rio 

(4.5) • ■ " 



fl^^{X,)dUHiv)] 



< e'l{||/.„||<M} / [[^^^^{X,)dUHiv) I >1 

JD{en,Pn)~JiPeo,vo J 

We continue with the integral over D{9n,Pn) under the restriction \\hn\\ < M 
and parametrize the model locally in terms of {6, (") [see (2.7)] 

(4.6) / Y\^^^iXi)d^H{v)= Y\^^{x,)du{c\e = en), 

JD{e„,pn) f-Jl Peo,vo JB{p„) fj^ Qeofi 

where n(-|^) denotes the prior for (" given 9, that is, Hh translated over 'i]*{9). 
Next we note that by Fubini's theorem and the domination condition (2.8), 
there exists a constant L > such that 

„ n 

pn rr^^(^j(rfn(c|0 = ^n)-dn(c|^ = ^o)) 

JB{pn)i^ 900,0 

< L\n{B{pn) \9 = 9n)- n{B{pn) \ 9 = 9o)\ 

for large enough n. Since the least-favorable submodel is stochastically LAN, 
Lemma 4.1 asserts that the difference on the r.h.s. of the above display 
is 0(1), so that 



n 



/ r['Mji£^Xi)dIi{Q\9 = 9„ 
JBip„)f-JiQeo,o 

(4-7) 



f ]j^^(x,)dn(C) + op„(i), 



'B{p„) " t/eo,0 

where we use the notation Il{A) = n(C € A\9 = 9q) for brevity. We define 
for ah C, £ > 0, n > 1 the events Fn{Q,e) = {sup^ |G„(/i,C) — G.«(/i,0)| < e}. 
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With (2.8) as a domination condition, Fatou's lemma and the fact that 
F^(O,e) = lead to 



limsup/ Ql^^{F^{C,s))dUiC) 

n~>co J B(pn) 



(4.8) 

< / limsup lB(,„)\|o}(C)QL,c(^n(C,^))dn(C)=0 

J n— >oo 

[again using (2.8) in the last step]. Combined with Fubini's theorem, this 
suffices to conclude that 

r. n „ n 

(4.9) / J]«^(x,)dn(c)=/ n^^(^^)ii^4c,.)rfn(c) + op„(i), 

and we continue with the first term on the right-hand side. By stochastic 
local asymptotic normality for every ^, expansion (4.4) of the log-likelihood 
implies that 

n n 

(4.10) TT^^(x,) = rr^^(X,)e«"(^-^)+^"('^-«, 

where the rest term is of order oq^ (1). Accordingly, we define, for every (", 

the events A.(C,e) = {\Rnihn,C)\ < ^e}, so that Q^^^^(^^(C,e)) ^ 0. Con- 
tiguity then implies that Qq^ ^{A'^{(,e)) — ;• as well. Reasoning as in (4.9) 
we see that 



(4.11) 



nf^(^.)iK(c,e)dn(C) 



p lb 

= / n?^(^^)U.(c,e)nF4c,e)rfn(C) + op„(i) 

For fixed n and ( and for all {Xi, . . . , X„) e An{C, e) PI Fn{C, e), 



logJ]^^(X,)-G„(/i„,0) 



i=l 



00,0 



<2e, 



so that the first term on the right-hand side of (4.11) satisfies the bounds 

„ n 

JB{pn)tril0O,O 



„ n 

(4.12) < / J]^(X,)lA„(C,.)nF„(C,.)dn(C) 

„ n 

2s n7^(^^)iA„(c,.)nF4c,.)rfn(c). 



^ Gnih„,0)+2e 
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The integral factored into lower and upper bounds can be relieved of the 
indicator for An n F„ by reversing the argument that led to (4.9) and (4.11) 
(with ^0 replacing 6n), at the expense of an e°^o( ^-factor. Substituting 
in (4.12) and using, consecutively, (4.11), (4.9), (4.7) and (4.5) for the 
bounded integral, we find 

gG„(h„,0)-3e+op,{l)^^(Q) < ^^(^^^^ < gG„(h„,0)+3£+op„(l)g^(Q)_ 

Since this holds with arbitrarily small < e' < e for large enough n, it 
proves (4.2). D 

With regard to the nuisance rate {pn), we first note that our proof of The- 
orem 2.1 fails if the slowest rate required to satisfy (2.8) vanishes faster then 
the optimal rate for convergence under n~^"-perturbation [as determined 
in (3.7) and (3.2)]. 

However, the rate (p„) does not appear in assertion (4.2), so if said con- 
tradiction between conditions (2.8) and (3.7)/(3.2) do not occur, the se- 
quence (pn) can remain entirely internal to the proof of Theorem 4.2. More 
particularly, if condition (2.8) holds for any (pn) such that npf^ — t- oo, integral 
LAN only requires consistency under ?i~"^' ^-perturbation at some such (pn)- 
In that case, we may appeal to Corollary 3.3 instead of Theorem 3.1, thus re- 
laxing conditions on model entropy and nuisance prior. The following lemma 
shows that a first-order Taylor expansion of likelihood ratios combined with 
a boundedness condition on certain Fisher information coefficients is enough 
to enable use of Corollary 3.3 instead of Theorem 3.1. 

Lemma 4.3. Let be one-dimensional. Assume that there exists a p> 
such that for every (^ £ B{p) and all x in the samplespace, the map 9 i— ;■ 
log(q'6»,c/96»o,c)(^) ^-^ continuously differentiahle on [6q — p.,6q + p] with Lebes- 
gue-integrable derivative gQ^(^{x) such that 

(4.13) sup sup Qg (^gg (^ <oo. 

CeB{p){e:\e-eo\<p} 

Then, for every PniO end all bounded, stochastic (hn), Un{pn,hn) = 0(1). 

Proof. Let (/i„) be stochastic and upper-bounded by M > 0. For ev- 
ery C and all n > 1, 



Ql 



n 



9n(fen),C 

9eo,c 



{Xi 



1 



Ql 



),C 



< 



q: 



e',C 



Bo 



Y^g,,^^(X,)Yl^^^{Xj)d9' 

U i=i '^^o.C 



T.9e',dX^) 



i=l 



dO' 



Je, 



Q0',C9e' c*^^'' 
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where the last step fohows from the Cauchy-Schwarz inequahty. For large 
enough n, pn < p and the square-root of (4.13) dominates the difference 
between C/(p, /i„) and 1. D 

5. Posterior asymptotic normality. Under the assumptions formulated 
before Theorem 2.1, the marginal posterior density 7rn(-|^i, . . . , Xn) : — )• M 
for the parameter of interest with respect to the prior lie equals 

(5.1) 7rn{e\Xi, ...,Xn) = Sn{e)/f 5„(0') dUeiO'), 

Pj^l'-almost-surely. One notes that this form is equal to that of a parametric 
posterior density, but with the parametric likelihood replaced by the inte- 
grated likelihood Sn- By implication, the proof of the parametric Bernstein- 
von Mises theorem can be applied to its semiparametric generalization, if 
we impose sufficient conditions for the parametric likelihood on Sn instead. 
Concretely, we replace the smoothness requirement for the likelihood in The- 
orem 1.1 by (4.2). Together with a condition expressing marginal posterior 
convergence at parametric rate, (4.2) is sufficient to derive asymptotic nor- 
mality of the posterior; cf. (1.2). 

Theorem 5.1 (Posterior asymptotic normality). Let © be open in M.^ 
with a prior Il@ that is thick at 6q. Suppose that for large enough n, the 
map /i I— )■ Sn{h) is continuous P^- almost- surely. Assume that there exists an 
L2{Po) -function ^9o,7?o such that for every (hn) that is bounded in probabil- 
ity, (4-2) holds, Po^6>o,??o ~0 '^^^ ^do,rio ^•s nonsingular. Furthermore suppose 
that for every {Mn), Mn — )• oo, we have 

(5.2) n„(||/i||<M„|Xi,...,X„) Al. 

Then the sequence of marginal posteriors for 9 converges to a normal dis- 
tribution in total variation, 

sup|n„(/iGA|Xi,...,X„)-Ar^ j-i (A)|Ao, 

centered on A„ with covariance matrix 17 

Proof. The proof is identical to that of Theorem 2.1 in [26] upon re- 
placement of parametric likelihoods with integrated likelihoods. D 

There is room for relaxation of the requirements on model entropy and 
minimal prior mass, if the limit (2.8) holds in a fixed neighborhood of r^o- 
The following corollary applies whenever (2.8) holds for any rate (pn)- The 
simplifications are such that the entropy and prior mass conditions become 
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comparable to those for Schwartz's posterior consistency theorem [37], rather 
than those for posterior rates of convergence following Ghosal, Ghosh and 
van der Vaart [16]. 

Corollary 5.2 (Semiparametric Bernstein-von Mises, rate-free). Let 
Xi, X2, . . . be i.i.d.-Po, with Pq £ ^ , and let Hq be thick at 6q. Suppose that 
for large enough n, the map h 1— )• Sn{h) is continuous Pq- almost- surely. Also 
assume that 9 1— )■ Qb,c^ is stochastically LAN in the 0-direction, for all C, in 
an r H -neighborhood of C = o-nd that the efficient Fisher information le^.r^o 
is nonsingular. Furthermore, assume that: 

(i) For all p > 0, the Hellinger metric entropy satisfies, N{p, H, dfj) < 
00 and the nuisance prior satisfies IIh{K{p)) > 0. 

(ii) For every M > 0, there exists an L > such that for all p> and 
large enough n, K{p) C Kn{Lp,M). 

Assume also that for every bounded, stochastic (hn): 
(iii) There exists an r > such that, Un{r,hn) = 0(1). 
(iv) Hellinger distances satisfy, sup^^.^ H{Pe,^(hn),ri,Peo,r,) = 0(n~^/^), 
and that 
(v) For every (Mn), Mn — )■ 00, the posterior satisfies, 

n„(||/i||<M„|Xi,...,x„) Ai. 



Then the sequence of marginal posteriors for 9 converges in total variation 
to a normal distribution, 

sup|n„(/iG^|Xi,...,x„)-iv^ j-1 (A)|Ao, 

centered on A„ with covariance matrix 17 „ . 

" 6/0, rjo 

Proof. Under conditions (i), (ii), (iv) and the stochastic LAN assump- 
tion, the assertion of Corollary 3.3 holds. Due to condition (iii), condi- 
tion (2.8) is satisfied for large enough n. Condition (v) then suffices for 
the assertion of Theorem 5.1. D 

A critical note can be made regarding the qualification "rate-free" of 
Corollary 5.2: although the nuisance rate does not make an explicit ap- 
pearance, rate restrictions may arise upon further analysis of condition (v). 
Indeed this is the case in the example of Section 7, where smoothness re- 
quirements on the regression family are interpretable as restrictions on the 
nuisance rate. However, semiparametric models exist, in which no restric- 
tions on nuisance rates arise in this way: if i7 is a convex subspace of 
a linear space, and the dependence r] 1— ?• Pg^^ is linear (a so-called convex- 
linear model, e.g., mixture models, errors- in- variables regression and other 
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information-loss models), the construction of suitable tests (of. Le Cam [30], 
Birge [2, 3]) does not involve Hellinger metric entropy numbers or restric- 
tions on nuisance rates of convergence. Consequently there exists a class of 
semiparametric examples for which Corollary 5.2 stays rate-free even after 
further analysis of its condition (v) . 

As shown in [26], the particular form of the limiting posterior in Theo- 
rem 5.1 is a consequence of local asymptotic normality, in this case imposed 
through (4.2). The marginal posterior converges exactly to the asymptotic 
sampling distribution of a frequentist best-regular estimator as a conse- 
quence. Other expansions (e.g., in LAN models for non-i.i.d. data or under 
the condition of local asymptotic exponentiality (Ibragimov and Has'mins- 
kii [19])) can be dealt with in the same manner if we adapt the limiting form 
of the posterior accordingly, giving rise to other (e.g., one-sided exponential) 
limit distributions (see Kleijn and Knapik [25]). 

6. Marginal posterior convergence at parametric rate. Condition (5.2) 
in Theorem 5.1 requires that the posterior measures of a sequence of model 
subsets of the form 

(6.1) enXH = {{6, ri)e@xH: ,/^\\9 - 9o\\ < M„} 

converge to one in i-Q-pi^obability, for every sequence (M„) such that M„ — )• 
oo. Essentially, this condition enables us to restrict the proof of Theorem 5.1 
to the shrinking domain in which (4.2) applies. In this section, we con- 
sider two distinct approaches: the first (Lemma 6.1) is based on bounded 
likelihood ratios (see also condition (B3) of Theorem 8.2 in Lehmann and 
Casella [32]). The second is based on the behavior of misspecified parametric 
posteriors (Theorem 6.2). The latter construction illustrates the intricacy of 
this section's subject most clearly and provides some general insight. Meth- 
ods proposed here are neither compelling nor exhaustive; we simply put 
forth several possible approaches and demonstrate the usefulness of one of 
them in Section 7. 

Lemma 6.1 [Marginal parametric rate (I)]. Let the sequence of maps 9 i— )• 
Sn{9) he Po-almost-surely continuous and such that (4-2) is satisfied. Fur- 
thermore, assume that there exists a constant C > such that for any {Mn), 

(6.2) Poisnp sup P„log^^ < -^^] -^ 1. 

Then, for any nuisance prior lifj and parametric prior Hq, thick at 9q, 

(6.3) n(ni/2||^ - 0o|| > M„ I Xi, . . . ,X„) A 
for any (Mn), M„ -;> oo. 



24 P. J. BICKEL AND B. J. K. KLEIJN 

Proof. Let (M„), M„ — ^ oo be given. Define (An) to be the events 
in (6.2) so that Pq{A'^) =o(1) by assumption. In addition, let 



Sn = ||_ Sn{e) dUe{9) > e-^^'/'Sn{9o] 



By (4.2) and Lemma 6.3, P^{B^) = o(l) as weh. Then 
< P^U{e G e^ I Xi, . . .,X^)lA^nB„ + 0(1) 

\ JHJ0'^fJiPeo,v fJiPeo,vo J 

+ o{l) 

= o{l), 
which proves (6.3). D 

Although applicable directly in the model of Section 7, most other exam- 
ples would require variations. Particularly, if the full, nonpar ametric pos- 
terior is known to concentrate on a sequence of model subsets (Vn), then 
Lemma 6.1 can be preceded by a decomposition oi Q x H over Vn and V^, re- 
ducing condition (6.2) to a supremum over V^ (see Section 2.4 in Kleijn [24] 
and the discussion following the following theorem). 

Our second approach assumes such concentration of the posterior on 
model subsets, for example, deriving from nonparametric consistency in 
a suitable form. Though the proof of Theorem 6.2 is rather straightforward, 
combination with results in misspecified parametric models [26] leads to the 
observation that marginal parametric rates of convergence can be ruined by 
a bias. 

Theorem 6.2 [Marginal parametric rate (II)]. Let Hq and Hh be given. 
Assume that there exists a sequence (Hn) of subsets of H , such that the 
following two conditions hold: 

(i) The nuisance posterior concentrates on Hn asymptotically, 

(6.4) n(r?Gi/\i/„,|Xi,...,X„) AO. 
(ii) For every {Mn), Mn -;>oo, 

(6.5) Po" supn(ni/2||0-0o||>M,|77,Xi,...,X„)^O. 

Then the marginal posterior for 6 concentrates at parametric rate, that is, 

U{n^/^\\9 -9o\\>Mn\7j,Xi,...,Xn)^0 
for every sequence (Mn), Mn — )• oo. 
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Proof. Let (M„), M„ — )• oo be given, and consider the posterior for 
the complement of (6.1). By assumption (i) of the theorem and Fubini's 
theorem, 

< P^ f U{6 G e^, I r?,Xi, . . .,Xn)dU{7] I Xi, . . . ,X„) + o(l) 

< P^ sup U{n^/^\\9 -9o\\>Mn\7],Xi,...,Xn) + o(l), 

the first term of which is o(l) by assumption (ii) of the theorem. D 

Condition (ii) of Theorem 6.2 has an interpretation in terms of misspeci- 
fied parametric models (Kleijn and van der Vaart [26] and Kleijn [24]). For 
fixed r] £ H, the r/-conditioned posterior on the parametric model ^r] = 
{Pe,r] -6 £ @} is required to concentrate in n~^/^-neighborhoods of Oq un- 
der Pq. However, this misspecified posterior concentrates around 0*(r/) C @, 
the set of points in @ where the Kullback-Leibler divergence of Pg^^} with 
respect to Pq, is minimal. Assuming that Q*{rj) consists of a unique mini- 
mizer 0*{rj), the dependence of the Kullback-Leibler divergence on r/ must 
be such that 

(6.6) sup ||r (r/) - ^oll = o(n"^/2) 

in order for posterior concentration to occur on the strips (6.1). In other 
words, minimal Kullback-Leibler divergence may bias the (points of conver- 
gence of) r/-conditioned parametric posteriors to such an extent that consis- 
tency of the marginal posterior for is ruined. 

The occurrence of this bias is a property of the semiparametric model 
rather than a peculiarity of the Bayesian approach: when (point-)estimating 
with solutions to score equations, for example, the same bias occurs (see, 
e.g.. Theorem 25.59 in [43] and subsequent discussion). Frequentist literature 
also offers some guidance toward mitigation of this circumstance. First of 
all, it is noted that the bias indicates the existence of a better (i.e., bias- 
less) choice of parametrization to ask the relevant semiparametric question. 
If the parametrization is fixed, alternative point-estimation methods may 
resolve bias, for example, through replacement of score equations by general 
estimating equations (see, e.g.. Section 25.9 in [43]), loosely equivalent to 
introducing a suitable penalty in a likelihood maximization procedure. 

For a so-called curve- alignment model with Gaussian prior, the no-bias 
problem has been addressed and resolved in a fully Bayesian manner by 
Castillo [5]: like a penalty in an ML procedure, Castillo's (rather subtle 
choice of) prior guides the procedure away from the biased directions and 
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produces Bernstein- von Mises efficiency of the marginal posterior. A most 
interesting question concerns generalization of Castillo's intricate construc- 
tion to more general Bayesian context. 

Recalling definitions (2.5) and (4.1), we conclude this section with a lemma 
used in the proof of Lemma 6.1 to lower-bound the denominator of the 
marginal posterior. 

Lemma 6.3. Let the sequence of maps 9 i— )• Sn{0) be Pq- almost- surely 
continuous and such that (4- 2) is satisfied. Assume that Hq is thick at 9q 
and denoted by n„ in the local parametrization in terms of h. Then 

(6.7) P^(fsnih)dUn{h)<anSn{0)] ^0 

for every sequence (a„), a„4,0. 

Proof. Let M > be given, and define C = {h: \\h\\ < M}. Denote the 
rest-term in (4.2) by /i i— )• RnQi). By continuity of i— )• Sn{0), supi^^Q\Rn{h)\ 
converges to zero in i-Q-probability. If we choose a sequence (k„) that con- 
verges to zero slowly enough, the corresponding events Bn = {sup^\Rn{h)\ < 
Kn}, satisfy PQ{Bn) — )• 1. Next, let (Kn), Kn — )• oo be given. There exists 
a TT > such that inf/igc'dn„/d/i(/i) > n, for large enough n. Combining, we 
find 






JT? 



On Bn, the integral LAN expansion is lower bounded so that, for large 
enough n. 



(6.9) 






since k„ < ^K^ and sup^jgc-l^^^eo.w^l ^ ^^ll-^0o,»?oll ^ I^n^ for large 
enough n. Conditioning fi on C, we apply Jensen's inequality to note that, 
for large enough n, 

<pA I h^^nieo,vo Mh\c) < -Ik; 
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since — log7r^(C) < gA'^, for large enough n. The probabihty on the right is 
bounded further by Chebyshev's and Jensen's inequahties and can be shown 
to be of order 0{K~^). Combining with (6.8) and (6.9) then proves (6.7). 

D 

7. Semiparametric regression. The partial linear regression model de- 
scribes the observation of an i.i.d. sample Xi, X2, ... of triplets Xi = {Ui, Vi, 
Yi) G M'^, each assumed to be related through the regression equation 

(7.1) Y = eoU + rjo{V) + e, 

where e~ A^(0, 1) is independent of (JJ,V). Interpreting rjQ as a nuisance 
parameter, we wish to estimate ^o- It is assumed that {U, V) has an unknown 
distribution P, Lebesgue absolutely continuous with density p : M^ — )• R. The 
distribution P is assumed to be such that PU = 0, PU^ = 1 and PU^ < 00. 
At a later stage, we also impose P{U — Ej[U\V])'^ > and a smoothness 
condition on the conditional expectation v 1— )■ E[f/|y = v]. 

As is well known [1, 7, 33, 43], penalized ML estimation in a smoothness 
class of regression functions leads to a consistent estimate of the nuisance 
and efficient point-estimation of the parameter of interest. The necessity of 
a penalty signals that the choice of a prior for the nuisance is a critical one. 
Kimeldorf and Wahba [23] assume that the regression function lies in the 
Sobolev space H [0, 1] (see [44] for definition), and define the nuisance prior 
through the Gaussian process 

k 

(7.2) r?(t)=^Z,^ + (/o\W^)(t), 

j=0 
where W = {Wt : t G [0, 1]} is Brownian motion on [0, 1], (Zq, . . . , Z^) form 
a VK- independent, A^(0, l)-i.i.d. sample and /q^ denotes (/o^/)(i) = /q f{s) ds 
or /q^ / = /o_|_/o_|_/ for all i > 1. The prior process 77 is zero- mean Gaussian 
of (Holder-)smoothness k + 1/2 and the resulting posterior mean for rj con- 
centrates asymptotically on the smoothing spline that solves the penalized 
ML problem [39, 46]. MCMC simulations based on Gaussian priors have 
been carried out by Shively, Kohn and Wood [41]. 

Here, we reiterate the question of how frequentist sufficient conditions are 
expressed in a Bayesian analysis based on Corollary 5.2. We show that with 
a nuisance of known (H61der-)smoothness greater than 1/2, the process (7.2) 
provides a prior such that the marginal posterior for 6 satisfies the Bernstein- 
von Mises limit. To facilitate the analysis, we think of the regression function 
and the process (7.2) as elements of the Banach space (C[0,1], || • ||oo)- At 
a later stage, we relate to Banach subspaces with stronger norms to complete 
the argument. 

Theorem 7.1. Let Xi,X2, ... be an i.i.d. sample from the partial linear 
model (7.1) with Pq = -P6»o,»?o f^^ some 9q G 0, t^q £ H . Assume that H is 
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a subset of C[0, 1] of finite metric entropy with respect to the uniform norm 
and that H forms a PQ-Donsker class. Regarding the distribution of {U,V), 
suppose that PU = 0, PU^ = 1 and PU^ < cx), as well as P{U - E[U\V])^ > 
0, P{U - E[f/|y])^ < oo and v^ E[U\V = v] € H . Endow G with a prior 
that is thick at 9q and C[0,1] with a prior Uh such that H C supp(nj;f). 
Then the marginal posterior for satisfies the Bernstein-von Mises limit, 

(7.3) sup|n(V^(e-^o)G5|^i,...,^n)-iVA 7-1 (-B)lAo, 
where 4,„o(^) = <U - E[C/|y]) and !e,,r,o = P{U - EpM? ■ 

Proof. For any 6 and r?, -Pe^^r)o\og{pe,r^/peo,r)o) = ^Peo,voii^ - ^o)U + 
{jj — rjQ){y))'^, so that for fixed 6, minimal KL-divergence over H obtains 
at 'q*{9) = rjQ — {9 — 9q)'E\U\V], P-almost-surely. For fixed C,, tlie submodel 
^ I— ^ Qe,C satisfies 

^ T-T Peo+n-l/2/i„,??*(6lo+n-l/2fe^)+^ 

h " 1 

(7.4) = ^ Y.9dX^) - 2^n'P.o,.o+C5c' 

+ \hn\¥n- P){U -E[U\V\f 

for all stochastic (^„), with gc^{X) = e{U - E[U\V]), e = Y - 9oU - {r]o + 
C){y) ~ -^(0, 1) under Pgp^^Q+i^. Since PU^ < oo, the last term on the right 
is opg ^ +f (1) if (^n) is bounded in probability. We conclude that 9 i— > Qq/^ 
is stochastically LAN. In addition, (7.4) shows that h>-^ Sn{h) is continuous 
for every n > 1. By assumption, /go.w — Po9o^ = P(C/ — E[C/|y])^ is strictly 
positive. We also observe at this stage that H is totally bounded in C[0, 1], 
so that there exists a constant D > such that ||-f/^||oo < -D. 

For any x G M'^ and all Cj the map 9 i— )■ logqQ^(^/qg^j^(^{x) is continuously 
differentiable on all of 6, with score ge ^{X) = e{U - E[U\V]) + {9- 9o){U - 
E[U\V]f. Since Qe.c^c = ^(^ ~ ^t^l'^])^ + (^ " ^o)^-P(C^ - HU\V])^ does 
not depend on (" and is bounded over 9 £[9o — p, 9o + p], Lemma 4.3 says that 
U{pn,hn) =0(1) for all PniO and all bounded, stochastic {hn). So for this 
model, we can apply the rate-free version of the semiparametric Bernstein- 
von Mises theorem. Corollary 5.2, and its condition (iii) is satisfied. 

Regarding condition (ii) of Corollary 5.2, we first note that, for M > 0, 
n > 1, r] £ H, 

sup -log "^ ^'' =^— [/' + ^|^(e-(r?-7?o)(F))| 
||h||<Af ^00,770 ^n \jn 

-e(77-7?o)(l^) + ^(r?-r/o)'(y), 
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where e~ A^(0, 1) under -P6»o,»)o- With the help of the boundedness of H, 
the independence of e and (U, V) and the assumptions on the distribu- 
tion of (JJ,V), it is then verified that condition (ii) of Corohary 5.2 holds. 
Turning to condition (i), it is noted that for all ?7i,r?2 £ H, dH{'ni,'i]2) < 
-Peo,v2 ^og{peo,vi/Peo,m) = lllm - %lli,p < ^hi - ^2||^- Hence, for any p > 
0, N{p, ^eo^dn) < N{{2pY/'^,H, \\ ■ ||oo) < oo. Similarly, one shows that for 
aU -q both -Polog(p0o,,,/peo,,,o) and Po(log(pe(,,r;/peo,r?o))^ are bounded by 
{\ + -D^)||?7 — ??o||to- Hence, for any p > 0, K{p) contains a || • ||oo-ball. Since 
r/o S supp(nj:f), we see that condition (i) of Corollary 5.2 holds. Noting that 
{VBr,[h)JPeQ,n[x)f/'^ = exp((/i/2V^)eC/ - {h?/An)U'^), one derives the ry- 
independent upper bound, 

H\Pe„iK^U.Pe,,r^ < ^PU' + ^PU' = 0{n-^) 

for all bounded, stochastic (/in); so that condition (iv) of Corollary 5.2 holds. 

Concerning condition (v), let (M^), Mn — )• oo be given and define G^ as 

in Section 6. Rewrite sup^g^ sup^ge^ ^n log{pe,n/Peo,v) = supeee^ ((6* - 6*0) x 

(sup^PnZVF) - ^{d - dofFnW^), w'here Z = cq - C{V), W = U - E[U\V]. 

The maximum-likelihood estimate On for 6 is therefore of the form 6n = 
Oq + Rn, where i?„ = sup^P^ZW^/P^H^^ Note that PqZW = and that H 
is assumed to be Po-Donsker, so that sup.^ GnZW is asymptotically tight. 
Since, in addition, PnVF^ — )■ PqW^ almost surely and the limit is strictly 
positive by assumption, P^(y^|i?n| > jMn) = o(l). Hence, 

Po"fsupsupP„log^^>-^^') 

1,. .,M„ 1,^ ..A^...2- cm: 



Since PqW"^ > 0, there exists a C > small enough such that the first term on 
the right-hand side is of order o(l) as well, which shows that condition (6.2) 
is satisfied. Lemma 6.1 asserts that condition (v) of Corollary 5.2 is met as 
well. Assertion 7.3 now holds. D 

In the following corollary we choose a prior by picking a suitable k in (7.2) 
and conditioning on ||7y||Q, < M. The resulting prior is shown to be well 
defined below and is denoted H^ ^j. 

Corollary 7.2. Let a > 1/2 and M > be given; choose H = {rj £ 
C°'[0, 1] : \\rj\\a < M} and assume that t]q S C°'[0, 1]. Suppose the distribution 
of the covariates {U, V) is as in Theorem 7.1. Then, for any integer k> a — 
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1/2, the conditioned prior 11^ ^.j is well defined and gives rise to a marginal 
posterior for 9 satisfying (7.3). 

Proof. Choose k as indicated; the Gaussian distribution of rj over 
C[0, 1] is based on the RKHS H''+^[0, 1] and denoted U''. Since r] in (7.2) has 
smoothness k + 1/2 > a, n'^(r/ G C"[0, 1]) = 1. Hence, one may also view r] 
as a Gaussian element in the Holder class C"[0, 1], which forms a separable 
Banach space even with strengthened norm || • || = ||??||oo + || • lU? without 
changing the RKHS. The trivial embedding of C"[0, 1] into C[0,1] is one- 
to-one and continuous, enabling identification of the prior induced by rj on 
C"[0, 1] with the prior U'' on C[0, 1]. Given % G C"[0, 1] and a sufficiently 
smooth kernel (pa- with bandwidth a > 0, consider (pa^rjo £ H^'^^lO, 1]. Since 
11% ~ </'o-*%||oo is of order a°', and a similar bound exists for the a-norm of 
the difference [44], rjQ lies in the closure of the RKHS both with respect to 
II • I loo and to II • II . Particularly, 770 lies in the support of H'^, in C"[0, 1] with 
norm || • ||. Hence, || • ||-balls centered on 770 receive nonzero prior mass, that 
is, n*^(||?7 — ?7o|| < p) > for all p > 0. Therefore, H'^dlr/ — r/o||oo < P, IklU < 
||%||o + /3) > 0, which guarantees that H'^dlry — ?7o||oo < P, WvWa < M) > 0, for 
small enough p > 0. This implies that H'^dlr/Ha < M) > 0, and 

is well defined for all Borel-measurable B C C[0, 1]. Moreover, it follows 
that H^ A/dk — %||oo < p) > for all p > 0. We conclude that k times in- 
tegrated Brownian motion started at random, conditioned to be bounded 
by M in a-norm, gives rise to a prior that satisfies supp(n^^j) = H. As 
is well-known [45], the entropy numbers of H with respect to the uniform 
norm satisfy, for every p> 0, N{p,H, \\ • ||oo) < Kp~^'°', for some constant 
K > that depends only on a and M. The associated bound on the brack- 
eting entropy gives rise to finite bracketing integrals, so that H universally 
Donsker. Then, if the distribution of the covariates (U, V) is as assumed in 
Theorem 7.1, the Bernstein-von Mises limit (7.3) holds. D 
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